-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automate spam detection #772
base: main
Are you sure you want to change the base?
automate spam detection #772
Conversation
b46ba40
to
20c2574
Compare
20c2574
to
5b1ce46
Compare
we would need to add the jetstream instance ip to ALLOWED_HOSTS |
just missing the migration for I'll read through it again but it seemed all good on a first pass, besides having a way to kick off the process |
Regarding the way to start the process from CoMSES side:
something like this? |
84f051c
to
0cbae60
Compare
@asuworks I just remembered there was some additional cleanup I wanted to do eventually with the spam stuff. This might be a good place to get that done if you are up for it. comses/planning#249. Namely the second point (refactoring the serializer mixin to actually be just a mixin) |
- api/spam/get-latest-batch/ returns the latest set of content to be checked for spam - api/spam/update updates the status of the content - a SpamModeration record with status `SCHEDULED_FOR_CHECK` is stored on every Job, Event, Codebase submission. A decoupled external service will query for these objects to check them for spam.
- fix tests - add asdf & direnv to .gitignore and .dockerignore
- unshelve JetStream2 instance, triggers the LLM spam check workflow, then shelves the instance again when the workflow is done.
…management command + minor refactoring
…ement command + minor refactoring
…pamModeration object is automatically created for the associated MemberProfile
bde15a4
to
38877e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this (everything besides Command
) be moved to a module in curator/? Maybe spam.py
This PR attempts to automate the spam detection process for
Job
,Event
,Codebase
andMemberProfile
objects using an external LLM service.LLM Spam Detection Process
SCHEDULED_FOR_CHECK
is stored on everyJob
,Event
,Codebase
, submission andUser
(SpamModeration
object is attached to the associatedMemberProfile
) creation.SpamModeration
objects (api/spam/get-latest-batch/
), analyzes them for spam and submits a spam report toapi/spam/update
for each one of them.api/spam/update
on the CoMSES side updates the correspondingSpamModeration
object according to the LLM report from the external service.Starting the LLM Spam Detection Process
The external service asuworks/comses.spamcheck is deployed on an existing JetStream2 instance which is unshelved before the spam check workflow is triggered and shelved automatically after it is done by the following management command:
Environment & Secrets
Following environment variables must be set:
JetStream2 Credentials
can be found here: https://js2.jetstream-cloud.org/identity/application_credentials/
secrets/llm_spam_check_jetstream_os_application_credential_secret
secrets/llm_spam_check_jetstream_os_application_credential_id
X-API-Key header for the API
Access to
api/spam/update
andapi/spam/get-latest-batch
routes is protected by theX-API-Key
header verification.The key should be set in
secrets/llm_spam_check_api_key
ALLOWED_HOSTS
The IP of the JetStream2 instance must be added to Django's
ALLOWED_HOSTS